Foundations of Machine Learning Frameworks Lab-2¶

Name: Devanshi Joshi
Id: 8868052

In [3]:
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
import plotly.graph_objects as go
import plotly.express as px
import plotly.offline as plotly

plotly.offline.init_notebook_mode()

Line Plot: Monthly Sales Over a Year¶

The code generates a line plot of monthly sales for a year, labeling each month, with data points indicated by markers and values shown. The month with the highest sales is highlighted in red. The generated graph shows monthly sales trends clearly and shows which month has the highest sales volume.

In [4]:
# Defining Sample Data
months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun','July','Aug','Sept','Oct','Nov','Dec']
sales = [11000, 13000, 12000, 16000, 15000, 19000, 13000, 14000, 15000, 20000, 11000, 12000]

plt.figure(figsize=(10, 6))  
plt.plot(months, sales, marker='o', linestyle='-', color='purple', linewidth=2, markersize=8, label='Monthly Sales')
plt.title('Monthly Sales Over a Year', fontsize=16)  
plt.xlabel('Month', fontsize=12) 
plt.ylabel('Sales (in $)', fontsize=12) 
plt.grid(True, linestyle='--', alpha=0.7)

# Adding the data points as text labels above markers
for month, sale in zip(months, sales):
    plt.text(month, sale + 200, f'{sale:,}', ha='center', fontsize=10, color='b')

# Customizing the appearance of the markers
plt.scatter(months, sales, color='b', s=80, zorder=3)

# Highlighting the maximum sales point
max_sale_month = months[sales.index(max(sales))]
plt.scatter(max_sale_month, max(sales), color='red', s=100, label=f'Max Sales ({max(sales):,})', zorder=4)


plt.legend(fontsize=12)
plt.tight_layout()  
plt.show()

Bar Chart: Product Sales Comparison¶

The code creates a bar chart to compare sales data for four different products: tablets, laptops, smartphones, and headphones. The actual sales figures are shown above each colored bar that represents the sales value of each product. It offers a crystal-clear visual comparison of these products' sales.

In [5]:
# Defining Sample Data
products = ['Smartphone', 'Laptop', 'Headphones', 'Tablet']
sales = [25000, 30000, 22000, 28000]

plt.figure(figsize=(8, 6)) 
bars = plt.bar(products, sales, color='m')

# Adding the data labels on top of the bars
for bar, sale in zip(bars, sales):
    plt.text(bar.get_x() + bar.get_width() / 2 - 0.15, bar.get_height() + 500, str(sale), fontsize=12, color='black')

plt.title('Product Sales Comparison')
plt.xlabel('Products')
plt.ylabel('Sales (in $)')
plt.ylim(0, 35000)
plt.xticks(rotation=45, ha='right')  
plt.grid(axis='y', linestyle='--', alpha=0.7)

plt.tight_layout() 
plt.show()

Histogram: Age Distribution¶

The code generates a random sample of age data, imitating a distribution with a mean of 30 and a standard deviation of 5. For the purpose of displaying the age distribution, it generates a histogram with 20 bins and colors it purple. The average age is represented as a vertical dashed line, along with one standard deviation above and below the average (in green) and the mean age (in red).

In [6]:
# Generating a random sample of age data
ages = np.random.normal(30, 5, 1000)  

plt.figure(figsize=(8, 6)) 
plt.hist(ages, bins=20, color='purple', alpha=0.7, edgecolor='black')  
plt.title('Age Distribution')
plt.xlabel('Age')
plt.ylabel('Frequency')
plt.grid(axis='y', linestyle='--', alpha=0.7)

# Adding vertical lines for denoting the mean and standard deviation
mean_age = np.mean(ages)
std_age = np.std(ages)
plt.axvline(mean_age, color='red', linestyle='dashed', linewidth=2, label=f'Mean Age ({mean_age:.2f})')
plt.axvline(mean_age + std_age, color='green', linestyle='dashed', linewidth=2, label=f'Std Deviation ({std_age:.2f})')
plt.axvline(mean_age - std_age, color='green', linestyle='dashed', linewidth=2)

plt.legend()  
plt.show()

Scatter Plot: Study Hours vs. Exam Scores¶

The code provides random data for study time and test results, assuring that the same random set of data is selected with a constant seed. The correlation between study time and exam performance is then shown in a scatter plot using Seaborn. A title and axis labels are added for clarity, and the plot is shown in purple with square markers.

In [7]:
# Generating Random Data for study hours and exam scores
np.random.seed(42)  # Setting a seed for reproducibility
study_hours = np.random.randint(1, 11, 9) 
exam_scores = np.random.randint(50, 101, 9)  

sns.set(style="whitegrid")
sns.scatterplot(x=study_hours, y=exam_scores, color='purple', marker='s', s=100, alpha=0.8)
plt.title('Study Hours vs. Exam Scores', fontsize=16)  
plt.xlabel('Study Hours', fontsize=12) 
plt.ylabel('Exam Scores', fontsize=12)  
plt.grid(True, linestyle='--', alpha=0.7)

plt.tight_layout()  # Ensure all elements fit within the figure
plt.show()

Box Plot: Sepal Length by Species (Iris Dataset)¶

This code generates a box plot that shows the distribution of sepal lengths among several iris flower species using the Iris dataset from Seaborn. A "flare" color scheme and particular line styles are used to alter the box plot's whiskers, median lines, and box shapes. It is helpful for comparing the sepal lengths of different iris species since it sheds light on the diversity and central tendency of sepal lengths within each species.

In [8]:
# Using the iris data set of seaborn 
data = sns.load_dataset('iris')

# Creating a box plot using the iris dataset
sns.set(style="whitegrid")
palette = sns.color_palette("flare")
sns.boxplot(x="species", y="sepal_length", data=data, palette=palette, width=0.6)  
plt.title('Box Plot of Sepal Length by Species')  
plt.xlabel('Species', fontsize=12)  
plt.ylabel('Sepal Length (cm)', fontsize=12)  

# Customizing the appearance of the boxes and whiskers
boxprops = dict(linestyle='--', linewidth=2, edgecolor='black')  # Dashed line style for boxes
whiskerprops = dict(linestyle='-', linewidth=2, color='black')  # Solid line style for whiskers
medianprops = dict(linestyle='-', linewidth=2, color='black')  # Solid line style for median lines
sns.boxplot(x="species", y="sepal_length", data=data, palette="Set3", width=0.6,
            boxprops=boxprops, whiskerprops=whiskerprops, medianprops=medianprops)

plt.tight_layout()
plt.show()

Heatmap: Passenger Counts by Month and Year (Flights Dataset)¶

The Seaborn library is used in this code to create a heatmap that shows the number of passengers over time. A table of passenger counts by month and year is created when the 'flights' dataset is loaded and pivoted. Annotations, gridlines, and a purple-red color scheme (referred to as "PuRd") are added to the heatmap to depict passenger counts. The map makes it simple to see trends and patterns in the passenger statistics for airlines by giving a clear picture of how passenger numbers vary by month and year.

In [9]:
# Using the flights data set of seaborn 
data = sns.load_dataset('flights')
flights_data = data.pivot_table(index='month', columns='year', values='passengers')

plt.figure(figsize=(10, 6))
sns.heatmap(flights_data, cmap='PuRd', annot=True, fmt='d', linewidths=.5)
plt.title('Passenger Counts by Month and Year')
plt.xlabel('Year')
plt.ylabel('Month')
plt.show()

3D Surface Plot: Example Surface Plot¶

Using Plotly, this code creates a 3D surface plot. It generates a grid of 100 points in each direction with x and y values. With these x and y values, a mathematical function (a saddle-shaped surface) is defined. The defined function is plotted in 3D using Plotly by the code. The generated plot has a title and three axes (X, Y, and Z) that are labeled for visualization and surface exploration.

In [10]:
# Create a grid of x and y values
x = np.linspace(-5, 5, 100)
y = np.linspace(-5, 5, 100)
X, Y = np.meshgrid(x, y)

# Define a function (e.g., a saddle-shaped surface)
Z = X**2 - Y**2

# Create a 3D surface plot with Plotly
fig = go.Figure(data=[go.Surface(z=Z, x=X, y=Y)])
fig.update_layout(title='3D Surface Plot', scene=dict(xaxis_title='X', yaxis_title='Y', zaxis_title='Z'))
fig.show()

Choropleth Map: Population by Country (Gapminder Dataset)¶

Using Plotly Express, this code creates an animated choropleth map. Using a color scale, it illustrates population density by country through time (1952-2007). The map displays differences in population density, and each frame is a year. Coastlines, white land regions, and a color legend showing population density are all included on the map. It offers a dynamic perspective of how population density shifts between nations over the chosen years.

In [11]:
# Considering the sample data (population density by country)
data = px.data.gapminder()
fig = px.choropleth(data, 
                    locations='iso_alpha', 
                    color='pop', 
                    hover_name='country',
                    animation_frame='year', 
                    range_color=[0, data['pop'].max() * 0.5],
                    color_continuous_scale=px.colors.sequential.Plasma,  
                    projection="natural earth") 

fig.update_geos(showcoastlines=True, coastlinecolor="Black", showland=True, landcolor="white") 
fig.update_layout(title='Population by Country (1952-2007)')
fig.show()